Lecture eight
::: {.incremental} - Context for PCA: \(\mathbf{Y}\) - In PCA, the focus is on simply creating a set of variables that capture the variance in \(\mathbf{Y}\) - In FA, we now postulate that the variance in \(\mathbf{Y}\) is driven by an underlying set of unobserved variables - In other words, we propose a statistical model that explains the covariance structure, and then fit it - It’s a richer, more extensible framework - This difference is formalised by the maths, which we’ll get into - :::
Let \(\mathbf{X} \in \mathcal{R}^p\) have mean \(\mathbf{\mu} \in \mathcal{R}^p\) and be linearly related to a set of \(m\leq p\) unobserved random variables \(\mathbf{F} \in \mathcal{R}^m\) by the equation
\[ \mathbf{X}-\mathbf{\mu} = \mathbf{L}\mathbf{F} + \mathbf{\epsilon}, \]
where \(\mathrm{E}[\mathbf{F}]= \mathbf{0}\), \(\mathrm{COV}[\mathbf{F}]= \mathbf{I}\), \(\mathrm{E}[\mathbf{\epsilon}]= \mathbf{0}\), and \(\mathrm{COV}[\epsilon]= \mathbf{\Psi}\) for \(\mathbf{\Psi}\) a diagonal matrix and \(\mathrm{Cov}[\mathbf{F}, \mathbf{\epsilon}]= \mathbf{0}\).
Terminology
The essential difference is that the independent (explanatory) variables are unobserved
Multivariate multiple regression model formulation:
Factor model formulation:
Differences:
\[ \begin{align*} \mathbf{X} - \mathbf{\mu} &= \mathbf{L}\mathbf{F} + \mathbf{\epsilon} \\ &= \mathbf{L}\mathbf{T}\mathbf{T}'\mathbf{F} + \mathbf{\epsilon} \\ &= \mathbf{L}^*\mathbf{F}^* + \mathbf{\epsilon}, \\ \end{align*} \]
for \(\mathbf{L}^*=\mathbf{L}\mathbf{T}\) and \(\mathbf{F}^*=\mathbf{T}'\mathbf{F}\).
factanal package)Johnson,RA & Wichern, DW. “Applied Multivariate Analysis”, 6th edition, Pearson International Edition, 2007.